BDMS-626: Improve validation, schema alignment, and well inventory handling#596
BDMS-626: Improve validation, schema alignment, and well inventory handling#596ksmuczynski wants to merge 38 commits intostagingfrom
Conversation
- Introduced `validation_alias` with `AliasChoices` for selected fields (`well_status`, `sampler`, `measurement_date_time`, `mp_height`) to allow alternate field names. - Ensured alignment with schema validation updates.
- Introduced unit tests for `WellInventoryRow` alias mappings. - Verified correct handling of alias fields like `well_hole_status`, `mp_height_ft`, and others. - Ensured canonical fields take precedence when both alias and canonical values are provided.
… and new fields - Added `flexible_lexicon_validator` to support case-insensitive validation of enum-like fields. - Introduced new fields: `OriginType`, `WellPumpType`, `MonitoringStatus`, among others. - Updated existing fields to use flexible lexicon validation for improved consistency. - Adjusted `WellInventoryRow` optional fields handling and validation rules. - Refined contact field validation logic to require `role` and `type` when other contact details are provided.
…dations - Refined validation error handling to provide more detailed feedback in test assertions. - Adjusted test setup to ensure accurate validation scenarios for contact and water level fields. - Updated contact-related tests to validate new composite field error messages.
- Renamed "Water" to "Water Bearing Zone" and refined its definition. - Added new term "Water Quality" under `note_type` category.
… to prevent cross-test collisions - Supports BDD test suite stability - Added hashing mechanism to append unique suffix to `well_name_point_id` for scenario isolation. - Integrated pandas for robust CSV parsing and content modifications when applicable. - Ensured handling preserves existing format for IDs ending with `-xxxx`. - Maintained existing handling for empty or non-CSV files.
…ollback side effects - Supports transaction management - Moved `session.refresh` calls under `commit` condition to streamline database session operations. - Reorganized `session.rollback` logic to properly align with commit flow.
…ory source fields in support of schema alignment and database mapping - Update well inventory CSV files to correct data inconsistencies and improve schema alignment. - Added support for `Sample`, `Observation`, and `Parameter` objects within well inventory processing. - Enhanced elevation handling with optional and default value logic. - Introduced `release_status`, `monitoring_status`, and validation for derived fields. - Updated notes handling with new cases and refined content categorization. - Improved `depth_to_water` processing with associated sample and observation creation. - Refined lexicon updates and schema field adjustments for better data consistency.
…h 1 well - Updated BDD tests to reflect changes in well inventory bulk upload logic, allowing the import of 1 well despite validation errors. - Modified step definitions for more granular validation on imported well counts. - Enhanced error message detail in responses for validation scenarios. - Adjusted sample CSV files to match new import logic and validation schema updates. - Refined service behavior to improve handling of validation errors and partial imports.
There was a problem hiding this comment.
Pull request overview
This PR updates the well-inventory CSV ingestion pipeline to better align schema validation with lexicon-backed enums, persist previously-missed fields (notes, monitoring/public availability, water level observations), and change the import behavior to “best-effort” so individual row failures don’t abort the entire upload.
Changes:
- Added row-level savepoints and improved error reporting so invalid rows are skipped while valid rows are persisted.
- Updated
WellInventoryRowschema to relax/adjust requirements and introduce lexicon-backed enum parsing plus CSV field aliases. - Updated BDD/unit tests and test CSV fixtures to reflect new validation rules and partial-success behavior.
Reviewed changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
services/well_inventory_csv.py |
Best-effort import via nested savepoints; mapping updates for release status, notes, monitoring status, and water level observation persistence |
schemas/well_inventory.py |
Schema optionality updates, lexicon enum coercion, contact + water level validation adjustments, and alias handling |
services/thing_helper.py |
Adds monitoring status history writes; adjusts commit/rollback behavior for outer-transaction support |
services/contact_helper.py |
Adjusts commit/rollback behavior for outer-transaction support |
cli/service_adapter.py |
Improves exit code and stderr reporting for partial success and validation failures |
tests/test_well_inventory.py |
Adds schema alias tests and helper row builder |
tests/features/well-inventory-csv.feature |
Updates expected partial-success outcomes in negative scenarios |
tests/features/steps/*.py |
Updates step definitions for partial success and loosens validation-error matching |
tests/features/data/*.csv |
Refreshes fixture data to match new lexicon/validation expectations |
core/lexicon.json |
Adds a new note_type term |
… unique well name suffixes in well inventory scenarios - Updated `pd.read_csv` calls with `keep_default_na=False` to retain empty values as-is. - Refined logic for suffix addition by excluding empty and `-xxxx` suffixed IDs. - Improved test isolation by maintaining scenario-specific unique identifiers.
…nd `DataQuality` - Changed `SampleMethodField` to validate against `SampleMethod` instead of `OriginType` - Changed `DataQualityField` to validate against `DataQuality` instead of `OriginType`
… import - Make contact.role and contact.contact_type nullable in the ORM and migrations - Update contact schemas and well inventory validation to accept missing values - Allow contact import when name or organization is present without role/type
- Stop round-tripping CSV fixtures through pandas to avoid rewriting structural test cases - Preserve repeated header rows and duplicate column fixtures so importer validation is exercised correctly - Keep the blank contact name/organization scenario focused on a single invalid row for stable assertions
…n errors - Prevent one actual validation error from satisfying multiple expected assertions (avoids false positives) - Keep validation matching order-independent while requiring distinct matches (preserves flexibility) - Tighten BDD error checks without relying on exact error text (improves test precision)
…behavior - Update partial-success scenarios to expect valid rows to import alongside row-level validation errors - Reflect current importer behavior for invalid lexicon, invalid date, and repeated-header cases - Keep BDD coverage focused on user-visible import outcomes instead of outdated all-or-nothing assumptions
…sitive parsing - Update unit expectations to accept lowercase placeholder tokens that are now supported - Document normalization of mixed-case and spaced placeholder formats to uppercase prefixes - Keep test coverage aligned with importer behavior and reduce confusion around valid autogen inputs
…DataQuality` - Adjust test data to reflect updated descriptions for `sample_method` and `data_quality` fields.
…ization scenarios - Add test to ensure contact creation returns None when both name and organization are missing - Add test to verify contact creation with organization only, ensuring proper dict structure - Update assertions for comprehensive validation of contact fields
|
@jirhiker Recent updates to the CLI command tests are failing on my branch. Since these changes don't seem to impact the well inventory ingestion, should I resolve the test failures within this PR, or would you prefer I handle them in a separate one before re-opening for review? |
…s for Sample and Observation - Replace `name_point_id` with `name` in `sample_name` generation - Rename `observation_*` fields for consistency with updated schemas
- Adjust `content.decode` to use `utf-8-sig` for correct header parsing of UTF-8 files with BOM - Prevent encoding issues when processing imported files
…h-to-water is blank - Treat blank depth_to_water_ft values as missing instead of invalid numeric input - Create water-level sample and observation records when water_level_date_time is present even if no depth value was obtained - Preserve attempted measurements for dry, obstructed, or otherwise unreadable wells without dropping the observation record
jacob-a-brown
left a comment
There was a problem hiding this comment.
Overall I think it's well done and well documented! I like your comments and how the code is organized
There was a problem hiding this comment.
If monitoring_status is added in CreateWell then I think that it should also be a field in WellResponse
| is_suitable_for_datalogger: bool | None = None | ||
| is_open: bool | None = None | ||
| well_status: str | None = None | ||
| monitoring_status: str | None = None |
There was a problem hiding this comment.
This should be restricted to MonitoringStatus enum values
| f" Zone={self.utm_zone}" | ||
| ) | ||
|
|
||
| if self.depth_to_water_ft is not None: |
There was a problem hiding this comment.
There should be a check here for mp_height, too, since it should be required for every observation.
| else (model.sample_method or "Unknown") | ||
| ) | ||
| sample = Sample( | ||
| field_activity_id=fa.id, |
There was a problem hiding this comment.
this should evaluate to the newly created "groundwater level" FieldActivity record, not the "well inventory" FieldActivity
| @then("{count:d} wells are imported") | ||
| @then("{count:d} well is imported") | ||
| def step_then_count_wells_are_imported(context: Context, count: int): | ||
| response_json = context.response.json() | ||
| wells = response_json.get("wells", []) | ||
| assert len(wells) == 0, "Expected no wells to be imported" | ||
| validation_errors = response_json.get("validation_errors", []) | ||
| assert ( | ||
| len(wells) == count | ||
| ), f"Expected {count} wells to be imported, but got {len(wells)}: {wells}. Errors: {validation_errors}" |
There was a problem hiding this comment.
If partial steps are not allowed then I don't think that this step will be necessary anymore
| def _minimal_valid_well_inventory_row(): | ||
| return { | ||
| "project": "Test Project", | ||
| "well_name_point_id": "TEST-0001", | ||
| "site_name": "Test Site", | ||
| "date_time": "2025-02-15T10:30:00", | ||
| "field_staff": "Test Staff", | ||
| "utm_easting": 357000, | ||
| "utm_northing": 3784000, | ||
| "utm_zone": "13N", | ||
| "elevation_ft": 5000, | ||
| "elevation_method": "Global positioning system (GPS)", | ||
| "measuring_point_height_ft": 3.5, | ||
| } |
There was a problem hiding this comment.
Since this is reusable I think that it should be a pytest fixture
| { | ||
| "water_level_date_time": "2025-02-15T10:30:00", | ||
| "depth_to_water_ft": "", | ||
| "sample_method": "Steel-tape measurement", | ||
| "data_quality": "Water level accurate to within two hundreths of a foot", | ||
| "water_level_notes": "Attempted measurement", | ||
| "mp_height_ft": 2.5, | ||
| } |
There was a problem hiding this comment.
This is a style comment, but if you put this dictionary in a variable you access the values for subsequent assert statements. Then if something changes in the future you only need to update the key-value pair and the assert should hold
| { | ||
| "measuring_person": "Tech 1", | ||
| "sample_method": "Steel-tape measurement", | ||
| "water_level_date_time": "2025-02-15T10:30:00", | ||
| "mp_height_ft": 2.5, | ||
| "level_status": "Static", | ||
| "depth_to_water_ft": 11.2, | ||
| "data_quality": "Water level accurate to within two hundreths of a foot", | ||
| "water_level_notes": "Initial reading", | ||
| } |
There was a problem hiding this comment.
See above style comment about setting the dictionary to a variable
- Use detailed error messages from `DatabaseError` for better debugging
…rganization terms - Treat blank contact organization and well status values as missing instead of persisting empty strings - Prevent foreign key failures caused by empty organization and status lexicon references during import - Add newly encountered organization terms to the lexicon so valid contact records can persist successfully
- Detect previously imported well inventory rows before inserting related records - Skip recreating field activity water-level samples and observations when the same row is reprocessed - Return serializable existing-row results so CLI reruns report cleanly instead of crashing
…e-database-errors kas-BDMS-626-resolve-database-errors
Each contact should have a role and contact_type
Why
This PR addresses the following problem / context:
public_availability_acknowledgement,monitoring_status,well/water notes, and water level observations were not being persisted correctly.How
Implementation summary - the following was changed / added / removed:
public_availability_acknowledgementnow maps toLocation.release_status(True → public, False → private, unset → draft).monitoring_statusis now written to theStatusHistorytable.well_notesandwater_notesare now stored as polymorphic notes on theThing.depth_to_water_ftnow auto-creates aSampleandObservationlinked to the field activity.WellInventoryRowschema:site_name,elevation_ft,elevation_method,measuring_point_height_ft, anddepth_to_water_ftoptional.elevation_method,depth_source,well_pump_type,monitoring_status,sample_method, anddata_quality.flexible_lexicon_validatorfor case-insensitive, whitespace-tolerant matching.contact_nameorcontact_organization, and madewater_level_date_timeonly required whendepth_to_water_ftis present.depth_to_water_ftNonevalidation_errorslist.UnboundLocalErrorcaused by auto-generated well IDs"Database error"fields.commit=Falsesupport to allowservices/thing_helper.pyandservices/contact_helper.pyo participate in the outer best-effort transaction without prematurely committing.tests/features/data/to use valid lexicon terms and properly quoted comma-containing values.Givensteps to isolate tests and prevent primary key conflicts.well-inventory-csv.featureto assert partial success (e.g., "1 well is imported") in negative scenarios. All 44 scenarios now pass.Notes
Any special considerations, workarounds, or follow-up work to note?